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Abstract. We show that a very simple modification of the Pure Greedy Algorithm for approxi¬ 
mating functions by sparse sums from a dictionary in a Hilbert or more generally a Banach space 
has optimal convergence rates on the class of convex combinations of dictionary elements. 
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1. Introduction 

Greedy algorithms have been used quite extensively as a tool for generating approximations from 
redundant families of functions, such as frames or more general dictionaries T). Given a Banach 
space X, a dictionary is any set T> of norm one elements from X whose span is dense in X. The 
most natural greedy algorithm in a Hilbert space is the Pure Greedy Algorithm (PGA), which 
is also known as Matching Pursuit, see [2] for the description of this and other algorithms. The 
fact that the PGA lacks optimal convergence properties has led to a variety of modified greedy 
algorithms such as the Relaxed Greedy Algorithm (RGA), the Orthogonal Greedy Algorithm, and 
their weak versions. There are also analogues of these, developed for approximating functions in 
Banach spaces, see m- 

The central issues in the study of these algorithms is their ease of implementation and their 
approximation power, measured in terms of convergence rates. If fm is the output of a greedy 
algorithm after m iterations, then fm is a linear combination of at most m dictionary elements. 
Such linear combinations are said to be sparse of order m. The quality of the approximation is 
measured by the decay of the error ||/ — fm\\ as m —>■ oo, where || • || is the norm in the Hilbert 
or Banach space, respectively. Of course, the decay rate of this error is governed by properties 
of the target function /. The typical properties imposed on / are that it is sparse, or more 
generally, that it is in some way compressible. Here, compressible means that it can be written as 
a (generally speaking, infinite) linear combination of dictionary elements with some restrictions on 
the coefficients. The most frequently applied assumption on / is that it is in the unit ball of the 
class Ai(P), that is the set of all functions which are a convex combination of dictionary elements 
(provided we consider symmetric dictionaries). It is known that the elements in this class can be 
approximated by m sparse vectors to accuracy 0(m“^/^), see Theorem EH and so this rate of 
approximation serves as a benchmark for the performance of greedy algorithms. 

It has been shown in [2] in the case of Hilbert space that whenever / G Ai('D), the output fm of 
the PGA satisfies 

(1.1) 11/-/mil = 0(m"^/®), m^oo. 

Later results gave slight improvements of the above estimate. For example, in 0, the rate ©(m ^/®) 
was improved to Based on the method from the latter paper, Sil’nichenko [9] then 
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showed a rate of 0{m ), where s solves a certain equation, and that > 11/62. Similar 

estimates for the weak versions of the PGA can be found in m- Estimates for the error from 
below have also been provided, see mM- 

The fact that the PGA does not attain the optimal rate for approximating the elements in 
Ai{'D) has led to various modifications of this algorithm. Two of these modifications, the Relaxed 
and the Orthogonal Greedy Algorithm were shown to achieve the optimal rate see [2]. 

The purpose of the present paper is to show that a very simple modification of the PGA, namely 
just rescaling fm at each iteration, already leads to the improved convergence rate for 

functions in Ai{T)). The rescaling we suggest is simply the orthogonal projection of / onto fm- 
We call this modified algorithm a Rescaled Pure Greedy Algorithm (RPGA) and prove optimal 
convergence rates for its weak version in Hilbert and Banach spaces. In a subsequent paper, see [1], 
we show that this strategy can also be applied successfully for developing an algorithm for convex 
optimization. 

The paper is organized as follows. In ^ we spell out our notation and recall some simple known 
facts related to greedy algorithms. In ^ we present the RPGA for a Hilbert space and prove the 
above convergence rate. The remaining parts of this paper consider a modification of this algorithm 
for Banach spaces and weak versions of this algorithm. 

2. Notation and Preliminaries 

We denote by R a Hilbert space and by X a Banach space with || • || being the norm in these 
spaces, respectively. A set of functions V C H[or X) is called a dictionary if ||(/7|| = 1 for every 
ip € V and the closure of span{'D) is H{or X). An example of a dictionary is any Shauder basis 
for H{ot X). However, the main idea behind dictionaries is to cover redundant families such as 
frames. A common example of dictionaries is the union of several Shauder bases. 

The set consists of all m-sparse elements with respect to the dictionary P, namely 

Em := Em(P) = {g ■ 9 ='^ A G P, |A| < m}. 

ipeA 

Here, we use the notation |A| to denote the cardinality of the index set A. For a general element / 
from X, we define the error of approximation 

CTmif) ■■= (rmif,'D) := inf II/- 5 II 

g^^rn 

of / by elements from Em- The rate of decay of (Jmif) as m —)> 00 says how well / can be 
approximated by sparse elements. 

For a general dictionary P C H{or X), we define the class of functions 

Ai(P,M) := {/ = ^Cfc(/)v9fc : G P, |A| < 00, ^ \ck{f)\ < M}, 

fceA fceA 

and by Ai(V,M) its closure in H{or X). Then, Ai(P) is defined to be the union of the classes 
A\{V, M) over all M > 0. For / G Ai(P), we define the “semi-norm” of / as 

|/U(^) := inf{M : /gAi(P,M)}. 

A fundamental result for approximating Ai(P) is the following, see [2]. 

Theorem 2.1. For a general dictionary V C H and f G Ai{T>) C H, we have 

(Tmif, V) < m = 1, 2,.... 

When analyzing the convergence of greedy algorithms, we will use the following lemma, proved 

in[8]. 
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Lemma 2.2. Let i > 0, r > 0, B > 0, and { 0 ^}^=! o,nd {rm}m =2 sequences of non-negative 
numbers satisfying the inequalities 

ai<B, am+i < am(l - m = l,2 ,.... 

Then, we have 

(2.1) Om < m = 2,3,.... 

We note that several similar versions of this lemma have been proved and used in analysis of 
greedy algorithms, see [TO] . 


3. The Hilbert space case 

In order to show the simplicity of our results, we begin with the standard case of the RPGA 
in a Hilbert space. Later, we treat the case of Banach spaces and weak algorithms, but the reader 
familiar with this topic will see that the results in these more general settings follow by standard 
modifications of the results from this section. We denote the inner product in the Hilbert space H 
by (•, •), and so the norm of / € H is ||/|| = (/, f)^^'^. 

The RPGA(P) is defined by the following simple steps. 


RPGA(T>): 

• Step 0: Define /o := 0. 

• Step m: 

• Assuming fm-i has been computed and fm-i ^ /• Choose a direction ipm € V such that 


With 


K/ - = sup \{f - fra-l,T)V 

(pGV 


■— {f /m —fm ■— fm—1 


define the next approximant to be 


fm — ^mfn 


if, fm) 

ll/mP 


• If / = fm, stop the algorithm and define fk = fm = f, for k > m. 

• If f ^ fm, proceed to Step m + 1. 


Note that if the output at each Step m were fm and not fm = Smfm, this would be the PGA. 
However, the new algorithm uses not fm, but the best approximation to / from the one dimensional 
space span{fm}, that is Smfm- Adding this step, which is just appropriate scaling of the output of 
the PGA, allows us to prove optimal convergence rate of for the proposed algorithm. 

Next, we show that the RPGA and the Relaxed Greedy Algorithm (RGA) provide different 
sequences of approximants {fm} and {fm}, respectively, and thus RPGA is different from the 
known so far greedy algorithms. For both algorithms 

/o = /o" = 0, /i = /[ = (/, 

where ipi £ T> is such that \{f,ipi)\ = sup^gp |(/, (/ 7 )|. For both RPGA and RGA, the next 
element ip2 £ T) is chosen as \{f — fi, g^2)\ = sup^gjj \ {f — fi,ip)\. One can easily compute that the 
next approximant, generated by the RPGA is 

{f, + {f, T2)‘^ - if, Ti){f, T2 ){ti,T2) 

if, + if, T 2 Y - if, 922 )^ 
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f2 = S2fl +S2{f - fl,T2)T2, S2 





while the classical RGA would give 


/2 = ^/i + 


There are some modifications of the RGA, see [T], where the approximant at Step m is determined 
not as 

/m = (1 - —)/m-l + —where |(/ - = sup 99 )!, 

m m (^gX) 

but as 

(3.1) /^ = (1 - am)fm-l + am^m, 

where am and ^pm are the solutions of the minimization problem 

rnin ||/ - ((1 - + aip) ||. 

ae[ 0 ,l],i, 5 GX> 

While the sequence, generated by the RPGA is a linear combination of fm-i and ipm, that is 

fm — Smfm—1 T ^mSmPmi 

it is different from the convex combinations (j3.ip . from other variations of the RGA, as described 
in m, and from the best approximation to / from span{f^_i,iprn}- For example, the best ap¬ 
proximation to / from span{fl,ip 2 } is 

rr ^ - (/: ^2){Pl,P2) r if, ^ 2 ) - {f,(pi){pi,P2) 

^ (/, ¥^1)^(1 - ^ 1 - 

and again /2 ^ f 2 - In summary, we can view the new algorithm either as a rescaled version of the 
PGA or a new modification of the RGA. 

We continue with the following theorem. 

Theorem 3.1. /// € Ai{V) C H, then the output {fm)m>o of the RPGAf'Pj satisfies 

(3.2) := 11/- /mil < m = l,2.... 

Proof: Since fm is the orthogonal projection of / onto the one dimensional space spanned by fm, 
we have 

(3.3) (/-/^,/^) = 0, m>0. 

Next, note that the definition of fm and the choice of Xm give 

11/ “ /mil = if ~ fm—1 ~ XmPm, f ~ fm—1 ~ XmPm) 

= 11 / - fm-lf - ^Xmif - fm-l,‘fm) + >^m\\^mf 

(3.4) = \\f - fm-lf - if - fm-l,Pmf, 

where we have used that \\<Pm\\ = 1- Now, assume / A fm-i- Since fm is the orthogonal projection 
of / onto span{fm}, we have 

el = 11/ - fmf = 11/ - SmLf < 11/ - fmf- 
We combine the latter inequality and (|3.4p to derive that 


(3.5) 


— Cm —1 if fm—1, Pm) , ^ — 1,2,.... 


We proceed with an estimate from below for (/ — fm-i, Pm)- Note that 

(3.6) el_^ = 11/ - /m-l|p = (/ - fm-l,f - fm-l) = (/ “ fm-l,f), 

where we have used (j3.3p . 
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It is enough to prove (I3.2p for functions / that are hnite sums / = Cjipj with \cj\ < M, 
since these functions are dense in Ai^D, M). Let us fix e > 0 and choose a representation for 
/ = such that 

^ ^ \Cip\ < M + 6. 

ipGT) 

It follows from (j3.6l) that 

e^_i = 

— I (/ ~ fm—li ‘fm) I ^ ^ |C(^ I 

ipGV 

< \{f - + e), 

where we have used the choice of We let e —>■ 0 and obtain the inequality 
(3.7) M-^el_,<\{f-U.u<fm)\. 

We combine (j3.5p and (13.7p to obtain 

eL < e^-i - m>2. 

Note that 

ii/f = (/,/) = Y1 ^ i(/>7^i)i Y1 < ii/ii(^+^)> 

ipG'D (pG'D 

and therefore ||/|| < M. Since < Cq = ||/|p < M^, we can apply Lemma [22] with am = e^, 
B = Vm := 1, r = Af^, and i = \. Then, (12.11) gives 

m>2, 

and the theorem follows. □ 

In the sections that follow, we introduce variants of the RPGA and prove convergence results 
similar to Theorem ixn 


4. The Weak Rescaled Pure Greedy Algorithm for Hilbert spaces 

In this section, we describe the Weak Rescaled Pure Greedy Algorithm (WRPGA). It is deter¬ 
mined by a weakness sequence where all € (0,1], and the dictionary V. We denote it 

by WRPGA({4},P). 


WRPGA({4},P): 

• Step 0: Dehne /o = 0. 

• Step m: 

• Assuming fm-i has been computed and fm-i ^ /• Choose a direction (pm G B such that 


With 


K/ - fm-l,Pm)\ > tm SUp |(/ - fm-l,p)V 
(pGT> 


— if fm •— fm—1 “1“ 


Sm. — 


define the next approximant to be 


if, fm) 

Wfmf ’ 


fm — Smfn 


• If / = fm- 1 , stop the algorithm and define fk = fm-i = f ioi k > m. 

• If f A fm, proceed to Step m -|- I. 
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In the case when all elements of the weakness sequence are tfc = 1, this algorithm is the 
RPGA(2?). The following theorem holds. 

Theorem 4.1. If f € Ai{V) C H, then the output {fm)m>o of the WRPGAf'jtfc}, satisfies 

/ m \ 

(4.1) em ■■= \\f - fmW < \f\Ai{V) , m>l. 

Proof: The proof is similar to the one of Theorem 13.11 where we show that for the error = 
11/ — /mIP, we have the inequality, 

(4.2) if - fm-i,(Pmf, m = l,2,.... 

The estimate from below for (/ — fm-i, ^m) is derived similarly as 

(4.3) M-^tmel^_^ < I (/ - /m-l, ^m) \, 

where we have used the definition of ipm- Next, it follows from (14.21) and (14.311 that 
el, < e^_i - M-Hl,el,_i = e^_i(l - m > 1, 

Note that 

ii/iP = (/,/) = c^{f,ip) < tf^\{f,ipi)\ Y K\ < Y\\f\\iM+^)^ 

ipGT> (pGT> 

and therefore ||/|| < Mtf^. Since ef < Cq = ||/|p < M'^tf'^, we can apply Lemma \T2\ with 
a-m = el,, B = M‘^tf‘^, Vm '■= tl,, r = and £ = 1 to obtain 

el,<M^ (tl + Ytl\ , m>2, 

V k=2 J 

and the theorem follows. □ 


5. The Banach space case 


In this section, we will state the RPGA((D) algorithm for Banach spaces X with norm || • || and 
dictionary V, and prove convergence results for certain Banach spaces. Let us first start with the 
introduction of the modulus of smoothness p of a Banach space X , which is defined as 

p{u):= sup i]-{\\f+ ug\\ + \\f-ug\\)-l\, u > 0. 

/,96A,||/|| = ||3||=1 [2 J 


In this paper, we shall consider only Banach spaces X whose modulus of smoothness satisfies the 
inequality 

p{u) < ju'^, 1 < q <2, 7 -constant. 

This is a natural assumption, since the modulus of smoothness of X = Lp, 1 < p < oo, for example, 
is known to satisfy such inequality. Recall that, see [3], for X = Lp, 


p{u) < < 


P 

P- 1 2 


if 1 < p < 2, 


if 2 < p < cx). 


Next, for every element / G X, / 7 ^ 0, we consider its norming functional Ff G X* with the 
properties ||T/|| = 1, Ff{f) = ||/||. Note that if X = R is a Hilbert space, the norming functional 
for / G R is 


Fj{-) 


< /,' > 
ll/ll 
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There is a relationship between the norming functional Fg for any g ^ g ^ Q, and the modulus 
of smoothness of X, given by the following lemma. 

Lemma 5.1. Let X be a Banach space with modulus of smoothness p, where p{u) < , 1 < q < 2. 

Let g ^ X, g 0 with norming functional Fg. Then, for every h ^ X , we have 

(5.1) \\g + uh\\ < ll^ll + uFgih) + 2Ju^g\\^-^hr, u>0. 

Proof: The proof follows from Lemma 6.1 in m and the property of the modulus of smoothness. 
□ 

We next present the RPGA(D) for the Banach space X with dictionary "D. 

RPGA(T>): 

• Step 0: Define /o = 0. 

• Step m: 

• Assuming fm-i has been computed and / 7 ^ fm-i- Choose a direction ipm G F> such that 

\Ff-fm-liTm)\ = sup \Ff_f^_^{^p)\. 

‘pG'D 

With 

1 1 

Xm = sign{Ff_f^_^{g?m)}\\f - /m-l||( 27 g) 1-1 1-1 , fm := fm-l + X^Tm, 

choose Sm such that 

11/ - SmfmW = min 11/ - sfmW, 

sGK 

and define the next approximant to be 

fm — Sm/m- 

• If / = fm-l, stop the algorithm and define fk = fm-i = / for A: > m. 

• If f ^ fm, proceed to Step m + 1 . 

The following lemma holds. 

Lemma 5.2. Let X be a Banach space with modulus of smoothness p, p{u) < , 1 < q < 2. Let 

fm-i be the output of the RPGAfLlJ at Step m — 1. Then, if f ^ fm-i, we have 

Pf-fm-lifm-l) = 0 . 

Proof: Let us denote by L := span{fm-i} C X. Clearly, fm-i G L, and moreover, fm-i is the 
best approximation to / from L. We apply Lemma 6.9 from m to the linear space L and the 
vector fm-i, and derive the lemma. □ 

The next theorem provides the convergence rate for the new algorithm in Banach spaces. 

Theorem 5.3. Let X be a Banach space with modulus of smoothness p{u) < , 1 < q < 2. If 

f € Ai{T>) C X, then the output (/m)m>o of the RPGAfLlJ satisfies 

(5.2) Cm := 11/-/mil < c|/U^(P)m^/''“\ m>2, 

where c = 0 ( 7 , q). 

Proof: Clearly, we have cq = ||/ — /o|| = ||/||- At Step m, m = 1, 2,... of the algorithm, either 
/ = fm-i, in which case fk = fm-i, k > m, and therefore Cm = 0, or we have 

Cm = 11/ - /mil = 11/ - Sm/m|| < 11/ - /mil = IK/ - /m-l) - Am(/?m||- 
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We now apply Lemma 15.11 to the latter inequality with g = f — fm-i / 0, u = |Am| > 0, /i = 
—sign{Xm}^m, and derive 

Cm < \\f - +‘2^l\XmV\\f - fra-lt~‘^\WmV 

= Cm-l ~ &rn-l 

(5-3) = e.m—1 ^ ( 27 ^) 6m-l 1-^/—/m,-i (y^m) ) 

where we have used that = 1 and the choice of Xm- Now, we need an estimate from below 

for Using Lemma [521 we obtain that 

(5.4) em-l = 11/ - /™-i|| = - fm-l) = Ff-f^_,if). 

As in the Hilbert space case, it is enough to consider functions / that are finite sums / = Cjipj 
with \cj\ < M, since these functions are dense in Ai{F,M). Let us fix e > 0 and choose a 
representation for / = Yhpev such that 


It follows that 


^ ^ \cip\ < M + e. 
(p&V 


Ff-fm-li^) < 

— \Ff-fm-l{^m)\ Iwl ^ \Ff-fm-l{^m)\{F[ + e). 


<p&V 


We take e ^ 0 and derive 


The latter estimate and (15.4p provide the estimate from below 

M Sm-l < \Ff-f^_i{(pm)\, 

which together with (15.31) result in 

...... A. 


6m ^ 6m—1 I 1 




Note that 61 < eo = ||/|| < M, since 


ll/ll = Ffif) = ^ \Ffig^i)\Y,K\ <M + e, 

if if 

1 q 

for every e > 0. We now use Lemma [22] with am = 6m, B = M, rm ■= ( 275 ) 1 - 9 , r = M< 1 -^, 

and i = -2-r to obtain 

q-l 

6m < M f 1 + (275) 1-9 (m - 1)J , m> 2, 


and the theorem follows. 


□ 





6. The Weak Rescaled Pure Greedy Algorithm for Banach spaces 

In this section, we describe the Weak Rescaled Pure Greedy Algorithm for Banach spaces. It is 
determined by a weakness sequence where all tk G (0,1], and the dictionary V. As in the 

Hilbert case, we denote it by WRPGA({tfc},D). 


WRPGA({4},P): 

• Step 0: Define /o = 0. 

• Step m: 

• Assuming fm-i has been computed and / ^ fm-i- Choose a direction (pm G T> such that 


> tm sup 

ipG'D 


With 


Xm = sign{Ff_f^_-^{p}m)}\\f - fm ■= fm-l + XmP>m, 

choose Sm such that 

11/ - SmfmW = min 11/ - sfmW, 

sGK 

and define the next approximant to be 

fm — Smfm- 

• If / = /m- 1 ) stop the algorithm and define fk = fm-i = f for k > m. 

• If f ^ fm, proceed to Step m + I. 

Next, we present the convergence rates for the WRPGA({tfc}, P) in Banach Spaces. 

Theorem 6.1. Let X be a Banaeh space with modulus of smoothness p{u) < 1 < q < 2. If 

f G Ai{V) C X, then the output {fm)m>o of the WRPGA(^{tfc}, PJ satisfies 

1/9-1 

( 6 . 1 ) 


Cm •— 11/ /mil < c|/|_4j(x)) I I' 


1 

q-l 

k 


m > 1 , 


\k=l 


where c = 0 ( 7 , q). 

Proof: As in the proof of Theorem 15.31 we show that 


( 6 . 2 ) 


^ ^m—1 


Q 


{2-fq) 1 -^ em-i\Ff-f^_, {‘Pm)\'^-F 


Next, similarly to Theorem 15.31 we prove an estimate from below for \Ff-f^_-^^{ipm)\, which is 

M tmO-m-l < \Ff-fm-i{L’m)\, 

which together with (16.21) result in 

0—1 1 -2- Q -3- 

y . 19-1 „ 9-1 


Cm ^ Cm—1 I 1 


(27^)1^ 


Again, since ci < eo = ||/|| < Mt^ , we can use Lemma \2^ with am = Cm, B = Mt^ , := 

9 

9-1' 


1 1 -2-r 1 

{2jq) tm , T = Mi-i, and £ = Then, ()2.ip gives 


/ <7 „ _i 1 9 

Cm < A/ -( 27 i?)tw ^ 42-1 

V ^ k=2 


1 / 9-1 


m > 2, 


□ 


and the theorem follows. 
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